Cube Index: A Text Index Model for Retrieval and Mining

نویسنده

  • A. V. Reddy
چکیده

Text retrieval, Analysis, Mining and Knowledge management have gained a lot of importance in a time when we drown in information but are starved for knowledge. In this paper, we propose a novel Index that uses a Text Cube model to store the text information similar to a data cube in Data Mining. This model creates a direct index, next word index and inverted index in a single Cube Index which is three dimensional in nature. The Dimensions considered are first word, next word and document. The measure of the cube is the frequency of occurrence of the word next-word pair. The cube index has been tested by modifying the open source of terrier 2.1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Full-Text Search Engines for Databases

Current databases are able to store several Tbytes of free-text documents. The main purpose of a database from the user’s viewpoint is the efficient information retrieval. In the case of textual data, information retrieval mostly concerns the selection and the ranking of documents. The selection criteria can contain elements that apply to the content or the grammar of the language. In the tradi...

متن کامل

A model for predicting dynamic frothability index of dual-frother blends

Dynamic frothability index (DFI) is a characteristic of any frother which presents useful information about frothing properties. The objective of this study is to introduce a prediction model for estimation of DFI value of dual-frother blends. Model uses the DFIs of frothers and mole ratio of weaker frother to calculate the blend’s DFI. The model reliability was confirmed by comparing the exper...

متن کامل

A genetic algorithm for text mining

Text workers should find ways of representing huge amounts of text in a more compact form. Textual documents can be represented by concepts. One way to define the concepts is by the terms, keywords extracted from the textual documents and cleaned by several processes like stopwords and stemming. Using the frequencies of the terms, one can quantify the relations between documents or portions of ...

متن کامل

An Improved Algorithm of Bayesian Text Categorization

Text categorization is a fundamental methodology of text mining and a hot topic of the research of data mining and web mining in recent years. It plays an important role in building traditional information retrieval, web indexing architecture, Web information retrieval, and so on. This paper presents an improved algorithm of text categorization that combines the feature weighting technique with...

متن کامل

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010